95% Replicability for Manual Word Sense Tagging
نویسنده
چکیده
People have been writing programs for automatic Word Sense Disambiguation (WSD) for forty years now, yet the validity of the task has remained in doubt. At a first pass, the task is simply defined: a word like bank can mean 'river bank ' or 'money bank' and the task-is to determine which of these applies in a context in which the word bank appears. The problems arise because most sense distinctions are not as clear as the distinction between 'river bank' and 'money b.~nk', so it is not always straightforward for a person to say what the correct answer is. Thus we do not always know what it would mean to say that a computer program got the right answer. The issue is discussed in detail by (Gale et al., 1992) who identify the problem as one of identifying the 'upper bound ' for the performance of a WSD program. If people can only agree on the correct answer x% of the time, a claim that a program achieves more than x% accuracy is hard to interpret, and x% is the upper bound for what the program can (meaningfully) achieve. There have been some discussions as to what this upper bound might be. Gale et al. review a psycholinguistic s tudy (Jorgensen, 1990) in which the level of agreement averaged 68%. But an u p p e r bound of 68% is disastrous for the enterprise, since it implies that the best a program could possibly do is still not remotely good enough for any practical purpose. Even worse news comes from (Ng and Lee, 1996), who re-tagged parts of the manually tagged S E M C O R corpus (Fellbaum, 1998). The taggings matched only 57% of the time. If these represent as high a level of intertagger agreement as one could ever expect, WSD is a doomed enterprise. However, neither s tudy set out to identify an upper bound for WSD and it is far from ideal to use their results in this way. In this paper we report on a s tudy which did aim specifically at achieving as high a level of replicability as possible. The s tudy took place within the context of SENSEVAL, an evaluation exercise for WSD programs. 1 It was, clearly, critical to the validity of SENSEVAL as a whole to establish the integrity of the 'gold s tandard ' corpus against which WSD programs would be judged. Measures taken to maximise the agreement level were:
منابع مشابه
Word Sense Disambiguation For Acquisition Of Selectional Preferences
The selectional preferences of verbal predicates are an important component of lexical information useful for a number of NLP tasks including disambigliation of word senses. Approaches to selectional preference acquisition without word sense disambiguation are reported to be prone to errors arising from erroneous word senses. Large scale automatic semantic tagging of texts in sufficient quantit...
متن کاملManaging Uncertainty in Semantic Tagging
Low interannotator agreement (IAA) is a well-known issue in manual semantic tagging (sense tagging). IAA correlates with the granularity of word senses and they both correlate with the amount of information they give as well as with its reliability. We compare different approaches to semantic tagging in WordNet, FrameNet, PropBank and OntoNotes with a small tagged data sample based on the Corpu...
متن کاملSemantic Annotating of Czech Corpus via WSD
We would like to describe the relationship between word sense disambiguation (WSD) and language resources (LR) working with word senses. We discuss the problem of sense division and tagging. Exploiting specific features of the inflectional languages for WSD is encouraged. We present WSD methods for Czech ambiguous nouns. The advantage of these methods consists in reducing the manual work by usi...
متن کاملGetting Serious About Word Sense Disambiguation
Recent advances in large-scale, broad coverage part-of-speech tagging and syntactic parsing have been achieved in no small part due to the availability of large amounts of online, human-annotated corpora. In this paper, I argue that a large, human sensetagged corpus is also critical as well as necessary to achieve broad coverage, high accuracy word sense disambiguation, where the sense distinct...
متن کاملDesign and Prototype of a Large-Scale and Fully Sense-Tagged Corpus
Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text se...
متن کامل